A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler
is a type of
translator
Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
that takes the
source code
In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
of a program written in a
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
as its input and produces an equivalent source code in the same or a different programming language. A source-to-source translator converts between programming languages that operate at approximately the same level of
abstraction, while a traditional
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
translates from a
higher level programming language to a
lower level programming language. For example, a source-to-source translator may perform a translation of a program from
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
to
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
, while a traditional compiler translates from a language like
C to
assembler or
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
to
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
.
An
automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g.,
OpenMP) or language constructs (e.g.
Fortran's
forall
statements).
Another purpose of source-to-source-compiling is translating legacy code to use the next version of the underlying programming language or an API that breaks backward compatibility. It will perform automatic
code refactoring
In computer programming and software design, code refactoring is the process of restructuring existing computer code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structur ...
which is useful when the programs to refactor are outside the control of the original implementer (for example, converting programs from Python 2 to Python 3, or converting programs from an old API to the new API) or when the size of the program makes it impractical or time-consuming to refactor it by hand.
Transcompilers may either keep translated code structure as close to the source code as possible to ease development and
debugging
In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems.
Debugging tactics can involve in ...
of the original source code or may change the structure of the original code so much that the translated code does not look like the source code.
There are also debugging utilities that map the transcompiled source code back to the original code; for example, the
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
Source Map standard allows mapping of the JavaScript code executed by a
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
back to the original source when the JavaScript code was, for example, minified or produced by a transcompiled-to-JavaScript language.
Examples include
Closure Compiler
Google Closure Tools is a set of tools to help developers build rich web applications with JavaScript. It was developed by Google for use in their web applications such as Gmail, Google Docs and Google Maps.
Closure Compiler
The Closure Comp ...
,
CoffeeScript,
Dart
Dart or DART may refer to:
* Dart, the equipment in the game of darts
Arts, entertainment and media
* Dart (comics), an Image Comics superhero
* Dart, a character from ''G.I. Joe''
* Dart, a ''Thomas & Friends'' railway engine character
* Dar ...
,
Haxe
Haxe is an open source high-level cross-platform programming language and compiler that can produce applications and source code, for many different computing platforms from one code-base. It is free and open-source software, released under the ...
,
Opal
Opal is a hydrated amorphous form of silica (SiO2·''n''H2O); its water content may range from 3 to 21% by weight, but is usually between 6 and 10%. Due to its amorphous property, it is classified as a mineraloid, unlike crystalline forms ...
,
TypeScript and
Emscripten
Emscripten is an LLVM/Clang-based compiler that compiles C and C++ source code to WebAssembly (or to a subset of JavaScript known as asm.js, its original compilation target before the advent of WebAssembly in 2017), primarily for execution in we ...
.
Assembly language translators
So called ''Assembly language translators'' are a class of source-to-source translators converting code from one
assembly language
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
into another, including (but not limited to) across different processor families and
system platforms.
Intel CONV86
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
marketed their 16-bit processor
8086
The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allowi ...
to be
source compatible to the
8080, an 8-bit processor.
To support this, Intel had an
ISIS-II
ISIS, short for Intel System Implementation Supervisor, is an operating system for early Intel microprocessors like the 8080. It was originally developed by Ken Burgett and Jim Stein under the management of Steve Hanna and Terry Opdendyk for the ...
-based translator from 8080 to 8086 source code named CONV86
(also referred to as CONV-86
and CONVERT 86
) available to
OEM
An original equipment manufacturer (OEM) is generally perceived as a company that produces non-aftermarket parts and equipment that may be marketed by another manufacturer. It is a common industry term recognized and used by many professional or ...
customers since 1978, possibly the earliest program of this kind.
It supported multiple levels of translation and ran at 2 MHz on an Intel Microprocessor Development System
MDS-800 with 8-inch
floppy drives. According to user reports, it did not work very reliably.
SCP TRANS86
Seattle Computer Products
Seattle Computer Products (SCP) was a Tukwila, Washington, microcomputer hardware company which was one of the first manufacturers of computer systems based on the 16-bit Intel 8086 processor. SCP began shipping its first S-100 bus 8086 CPU bo ...
' (SCP) offered TRANS86.COM,
written by
Tim Paterson
Tim Paterson (born 1 June 1956) is an American computer programmer, best known for creating 86-DOS, an operating system for the Intel 8086. This system emulated the application programming interface (API) of CP/M, which was created by Gary Kild ...
in 1980 while developing
86-DOS
86-DOS (known internally as QDOS, for Quick and Dirty Operating System) is a discontinued operating system developed and marketed by Seattle Computer Products (SCP) for its Intel 8086-based computer kit.
86-DOS shared a few of its commands wit ...
.
The utility could translate Intel 8080 and
Zilog
Zilog, Inc. is an American manufacturer of microprocessors and 8-bit and 16-bit microcontrollers. It is also a supplier of application-specific embedded system-on-chip (SoC) products.
Its most famous product is the Z80 series of 8-bit microp ...
Z80
The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were ...
assembly source code (with Zilog/
Mostek mnemonic
A mnemonic ( ) device, or memory device, is any learning technique that aids information retention or retrieval (remembering) in the human memory for better understanding.
Mnemonics make use of elaborative encoding, retrieval cues, and imag ...
s) into source code for the Intel 8086 (in a format only compatible with SCP's
cross-assembler
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
ASM86 for
CP/M-80), but supported only a subset of
opcode
In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
s, registers and modes, and often still required significant manual correction and rework afterwards.
Also, performing only a mere
transliteration
Transliteration is a type of conversion of a text from one writing system, script to another that involves swapping Letter (alphabet), letters (thus ''wikt:trans-#Prefix, trans-'' + ''wikt:littera#Latin, liter-'') in predictable ways, such as ...
,
the brute-force
single-pass translator did not carry out any register and jump optimizations.
It took about 24 KB of RAM.
The SCP version 1 of TRANS86.COM ran on Z80-based systems.
Once 86-DOS was running, Paterson, in a
self-hosting-inspired approach, utilized TRANS86 to convert itself into a program running under 86-DOS.
Numbered version 2, this was named TRANS.COM instead.
Later in 1982, the translator was apparently also available from
Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
.
Sorcim TRANS86
Also named TRANS86,
Sorcim offered an 8080 to 8086 translator as well since December 1980.
Like SCP's program it was designed to port CP/M-80 application code (in ASM, MAC, RMAC or ACT80 assembly format) to
MS-DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few ope ...
(in a format compatible with ACT86).
In ACT80 format it also supported a few Z80 mnemonics. The translation occurred on an instruction-by-instruction basis with some optimization applied to conditional jumps. The program ran under CP/M-80,
MP/M-80 and
Cromemco DOS
Cromemco DOS or CDOS (an abbreviation for Cromemco Disk Operating System) is a CP/M-like operating system by Cromemco designed to allow users of Cromemco microcomputer systems to create and manipulate disk files using symbolic names.
Overview
...
with a minimum of 24 KB of RAM, and had no restrictions on the source file size.
Digital Research XLT86
Much more sophisticated and the first to introduce
optimizing compiler technologies into the source translation process was
Digital Research
Digital Research, Inc. (DR or DRI) was a company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser DOS, DOS Plus, DR DOS and ...
's XLT86 1.0 in September 1981. XLT86 1.1 was available by April 1982.
The program was written by
Gary Kildall and translated source code for the Intel 8080 processor (in a format compatible with ASM, MAC or RMAC assemblers) into source code for the 8086 (compatible with ASM86). Using
global data flow analysis on 8080 register usage,
the five-phase
multi-pass translator would also optimize the output for code size and take care of calling conventions (CP/M-80
BDOS
CP/M, originally standing for Control Program/Monitor and later Control Program for Microcomputers, is a mass-market operating system created in 1974 for Intel 8080/ 85-based microcomputers by Gary Kildall of Digital Research, Inc. Initiall ...
calls were mapped into BDOS calls for
CP/M-86
CP/M-86 was a version of the CP/M operating system that Digital Research (DR) made for the Intel 8086 and Intel 8088. The system commands are the same as in CP/M-80. Executable files used the relocatable .CMD file format. Digital Research als ...
), so that CP/M-80 and MP/M-80 programs could be ported to the CP/M-86 and
MP/M-86
MP/M (Multi-Programming Monitor Control Program) is a discontinued multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each u ...
platforms automatically. XLT86.COM itself was written in
PL/I-80 for CP/M-80 platforms.
The program occupied 30 KB of RAM for itself plus additional memory for the
program graph. On a 64 KB memory system, the maximum source file size supported was about 6 KB,
so that larger files had to be broken down accordingly before translation.
Alternatively, XLT86 was also available for
DEC VAX/VMS.
Although XLT86's input and output worked on source-code level, the translator's in-memory representation of the program and the applied code optimizing technologies set the foundation to
binary recompilation
A binary recompiler is a compiler that takes executable binary files as input, analyzes their structure, applies transformations and optimizations, and outputs new optimized executable binaries.
The foundation to the concepts of binary recompila ...
.
Others
2500 AD Software offered an 8080 to 8086 source-code translator as part of their XASM suite for CP/M-80 machines with Z80 as well as for
Zilog ZEUS
The Z8000 ("''zee-'' or ''zed-eight-thousand''") is a 16-bit microprocessor introduced by Zilog in early 1979. The architecture was designed by Bernard Peuto while the logic and physical implementation was done by Masatoshi Shima, assisted by a ...
and
Olivetti PCOS systems.
Since 1979, Zilog offered a Z80 to
Z8000 translator as part of their PDS 8000 development system.
Advanced Micro Computers
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
(AMC)
and 2500 AD Software offered Z80 to Z8000 translators as well.
The latter was named TRANS
and was available for Z80 CP/M, CP/M-86, MS-DOS and PCOS.
The Z88DK development kit provides a Z80 to
i486
The Intel 486, officially named i486 and also known as 80486, is a microprocessor. It is a higher-performance follow-up to the Intel 386. The i486 was introduced in 1989. It represents the fourth generation of binary compatible CPUs following the ...
source code translator targeting
nasm named "to86.awk", written in 2008 by Stefano Bodrato.
It is in turn based on an 8080 to Z80 converter written in 2003 by Douglas Beattie, Jr., named "toz80.awk".
In 2021, Brian Callahan wrote an 8080 CP/M 2.2 to MS-DOS source code translator targeting
nasm named 8088ify.
Programming language implementations
The first implementations of some programming languages started as transcompilers, and the default implementation for some of those languages are still transcompilers. In addition to the table below, a
CoffeeScript maintainer provides a list of languages that compile to JavaScript.
Porting a codebase
When developers want to switch to a different language while retaining most of an existing codebase, it might be better to use a transcompiler compared to rewriting the whole software by hand. Depending on the quality of the transcompiler, the code may or may not need manual intervention in order to work properly. This is different from "transcompiled languages" where the specifications demand that the output source code always works without modification. All transcompilers used to port a codebase will expect manual adjustment of the output source code if there is a need to achieve maximum code quality in terms of readability and platform convention.
Transcompiler pipelines
A transcompiler pipeline is what results from ''recursive transcompiling''. By stringing together multiple layers of tech, with a transcompile step between each layer, technology can be repeatedly transformed, effectively creating a distributed
language independent specification.
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text or XSL Formatting Objects, which may subseque ...
is a general-purpose transform tool that can be used between many different technologies, to create such a
derivative code
Derivative code or Chameleon code is source code which has been derived entirely from one or more other machine readable file formats. If recursive transcompiling is used in the development process, some code will survive all the way through th ...
pipeline.
Recursive transcompiling
Recursive transcompilation (or recursive transpiling) is the process of applying the notion of transcompiling recursively, to create a pipeline of transformations (often starting from a
single source of truth
In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas su ...
) which repeatedly turn one technology into another.
By repeating this process, one can turn A → B → C → D → E → F and then back into . Some information will be preserved through this pipeline, from A → , and that information (at an abstract level) demonstrates what each of the components A–F agree on.
In each of the different versions that the transcompiler pipeline produces, that information is preserved. It might take on many different shapes and sizes, but by the time it comes back to , having been transcompiled six times in the pipeline above, the information returns to its original state.
This information which survives the transform through each format, from , is (by definition) derivative content or
derivative code
Derivative code or Chameleon code is source code which has been derived entirely from one or more other machine readable file formats. If recursive transcompiling is used in the development process, some code will survive all the way through th ...
.
Recursive transcompilation takes advantage of the fact that transcompilers may either keep translated code as close to the source code as possible to ease development and
debugging
In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems.
Debugging tactics can involve in ...
of the original source code, or else they may change the structure of the original code so much, that the translated code does not look like the source code. There are also debugging utilities that map the transcompiled source code back to the original code; for example,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
source maps allow mapping of the JavaScript code executed by a
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
back to the original source in a transcompiled-to-JavaScript language.
See also
*
*
*
* a source-to-source compiler framework using explicit pattern-directed rewrite rules
* a source-to-source compiler from Fortran 77 to C
* (running
IBM 1401 programs on
Honeywell H200
The Honeywell 200 was a character-oriented two-address commercial computer introduced by Honeywell in December 1963, the basis of later models in Honeywell 200 Series, including 1200, 1250, 2200, 3200, 4200 and others, and the character processor ...
)
*
*
*
*
*
*
*
* a source-to-source compiler framework
*
*
Notes
References
Further reading
*
*
*
1984-11-11 version 1.05(NB. The
DOS executable XLT86.COM
2 KBtranslates Intel 8080 assembly language source code to Intel 8086 assembly language source code. Despite its name this implementation in 8086 assembly is ''not'' related to Digital Research's earlier and much more sophisticated
XLT86
A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent sou ...
.)
*
* and , also available as
* (9 pages) (NB. This software translator was developed by ST and translates Motorola
Motorola, Inc. () was an American Multinational corporation, multinational telecommunications company based in Schaumburg, Illinois, United States. After having lost $4.3 billion from 2007 to 2009, the company split into two independent p ...
6805/ HC05 assembly source code in 2500AD Software format into ST7 source code. The MIGR2ST7.EXE executable for Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
is available from "MCU ON CD".)
*
External links
*
*
* {{cite web , title=Our Methodology – The Source to Source Conversion Process , publisher=Micro-Processor Services, Inc. (MPS) , url=http://www.mpsinc.com/process.html , access-date=2020-02-01 , url-status=live , archive-url=https://web.archive.org/web/20190512171423/http://www.mpsinc.com/process.html , archive-date=2019-05-12
Source-to-source compilers
Utility software types